Contents of Glottometrics 10 , 2005 ( including abstracts )

نویسندگان

  • Joseph F. Kess
  • Katsuo Tamaoka
  • Gabriel Altmann
چکیده

This paper traces the historical development of kanji, the Chinese characters used in the Japanese orthographic system. The paper outlines the structural principles which underlie their composition, both in respect to single kanji and to their combination in compound words. Discussion also pays attention to their usage and frequency, as well as to the various script reforms that have affected their number and deployment. Lastly, commentary on their role in the development of Japanese psycholinguistics and the relevance of this work to psychological studies of language in general is offered Katsuo Tamaoka, Gabriel Altmann Mathematical Modelling for Japanese Kanji Strokes in Relation to Frequency, Asymmetry and Readings 16-29 Abstract: The present study investigates the relationship between of Japanese kanji strokes and their printed-frequencies of occurrence, compositional asymmetry and kanji multiple readings. First, distributions of kanji strokes in both samples of the 1,945 basic kanji and of 6,355 kanji appearing in the Asashi Newspaper published between 1985 and 1998 followed a negative hypergeometric distribution as demonstrated by Figure 1. The distribution of strokes of the 1,945 kanji with their printed-frequencies is rather rhapsodic, as shown in Figure 2, but a rough-fitting model is drawn in Figure 3. Mathematical modelling for kanji strokes with lexical compositional asymmetry reveals the interesting tendency of regressive compounding; that is, that the greater the number of strokes in a kanji, the more it tends to produce two-kanji compound words by adding a kanji on the right side of the target kanji, as shown in Figure 4. A kanji may often have multiple readings; this study also examines the number of readings in relation to the number of kanji strokes. As shown in Figure 6, the greater the number of kanji strokes, the fewer the number of readings. In other words, the more visually complex the kanji is, the more specialised its reading becomes. As such, kanji strokes, as one of the central characteristics of kanji, are closely related to other properties such as frequency, asymmetry and readings. The present study uses mathematical modelling to indicate these relations. The present study investigates the relationship between of Japanese kanji strokes and their printed-frequencies of occurrence, compositional asymmetry and kanji multiple readings. First, distributions of kanji strokes in both samples of the 1,945 basic kanji and of 6,355 kanji appearing in the Asashi Newspaper published between 1985 and 1998 followed a negative hypergeometric distribution as demonstrated by Figure 1. The distribution of strokes of the 1,945 kanji with their printed-frequencies is rather rhapsodic, as shown in Figure 2, but a rough-fitting model is drawn in Figure 3. Mathematical modelling for kanji strokes with lexical compositional asymmetry reveals the interesting tendency of regressive compounding; that is, that the greater the number of strokes in a kanji, the more it tends to produce two-kanji compound words by adding a kanji on the right side of the target kanji, as shown in Figure 4. A kanji may often have multiple readings; this study also examines the number of readings in relation to the number of kanji strokes. As shown in Figure 6, the greater the number of kanji strokes, the fewer the number of readings. In other words, the more visually complex the kanji is, the more specialised its reading becomes. As such, kanji strokes, as one of the central characteristics of kanji, are closely related to other properties such as frequency, asymmetry and readings. The present study uses mathematical modelling to indicate these relations. Hisashi Masuda, Terry Joyce A Database of Two-Kanji Compound Words Featuring Morphological Family, Morphological Structure, and Semantic Category Data 30-44 Abstract: One of the most fundamental issues for all models of the mental lexicon One of the most fundamental issues for all models of the mental lexicon is how to represent essential information about the morphological structure of polymorphemic words. This paper describes the construction of a large-scale database of two-kanji compound words, which supplements a central component of data relating to 78,426 compound headwords from the Kojien dictionary with several components focusing on morphological family, morphological structure, and semantic category data. The database will be a particularly valuable resource in terms of supporting and extending research into the lexical retrieval and representation of two-kanji compound words within the Japanese mental lexicon from the perspective of compound word morphology, such as the series of constituent-morpheme priming experiments (Joyce, 1999, 2002, 2003a, 2003b, 2004; Joyce & Masuda, 2004) that are discussed briefly. Yayoi Miyaoka, Katsuo Tamaoka A Corpus Investigation of the Right-hand Head Rule Applied to Japanese Affixes 45-54 Abstract: The present study investigates differences between Japanese prefixes and suffixes using editions of the Asashi Newspaper published between 1985 and 1998 (Amano & Kondo, 2000). The right-hand head rule (e.g., Kageyama, 1982; Kageyama, 1999; Namiki, 1982; Nishigauchi, 2004; Williams, 1981) predicts that prefixes would be attached to a wide variety of nouns while suffixes would be regularly attached to a smaller group of nouns. Twenty-four frequently-used affixes consisting of 12 prefixes and 12 suffixes were compared according to 7 corpus features, including printed-frequency, productivity, accumulative productivity, commonality, coalescence degree, Herdan’s logarithmic function of type-token ratio (log TTR), and entropy. Although a series of Mann-Whitney U-tests calculated for the six corpus features of printed-frequency, productivity, accumulative productivity, commonality, coalescence degree and log TTR did not reveal any differences between the 12 prefixes and the 12 suffixes, the t-test for entropy indicated a significant difference. This suggests that the prefixes were more randomly or chaotically attached to nouns than the suffixes. Although the present findings are limited only to the selected 24 affixes, the result supported the righthand head rule. The present study investigates differences between Japanese prefixes and suffixes using editions of the Asashi Newspaper published between 1985 and 1998 (Amano & Kondo, 2000). The right-hand head rule (e.g., Kageyama, 1982; Kageyama, 1999; Namiki, 1982; Nishigauchi, 2004; Williams, 1981) predicts that prefixes would be attached to a wide variety of nouns while suffixes would be regularly attached to a smaller group of nouns. Twenty-four frequently-used affixes consisting of 12 prefixes and 12 suffixes were compared according to 7 corpus features, including printed-frequency, productivity, accumulative productivity, commonality, coalescence degree, Herdan’s logarithmic function of type-token ratio (log TTR), and entropy. Although a series of Mann-Whitney U-tests calculated for the six corpus features of printed-frequency, productivity, accumulative productivity, commonality, coalescence degree and log TTR did not reveal any differences between the 12 prefixes and the 12 suffixes, the t-test for entropy indicated a significant difference. This suggests that the prefixes were more randomly or chaotically attached to nouns than the suffixes. Although the present findings are limited only to the selected 24 affixes, the result supported the righthand head rule. Eric Long, Shochi Yokoyama Text genre and kanji frequency 55-72 Abstract: Various ways are explored in this study of using kanji frequency lists derived from multiple corpora to characterise kanji usage within the corpora. First we discuss the scope of, and issues in processing, four corpora derived from commercially available CD-ROMs: two encyclopedias, a database of newspaper articles, and a four-CD-ROM collection of the texts of mostly fictional paper back books. Next a summary of the kanji frequency data is given, and it is pointed out that the frequency distribution is noticeably different from a classic Zipf’s law distribution. A comparison is made between the standard set of J y kanji and highfrequency kanj in the corpora, and the degrees of similarity among the corpora are obtained with the Chi(2) By Degrees of Freedom (CBDF) measure proposed by Various ways are explored in this study of using kanji frequency lists derived from multiple corpora to characterise kanji usage within the corpora. First we discuss the scope of, and issues in processing, four corpora derived from commercially available CD-ROMs: two encyclopedias, a database of newspaper articles, and a four-CD-ROM collection of the texts of mostly fictional paper back books. Next a summary of the kanji frequency data is given, and it is pointed out that the frequency distribution is noticeably different from a classic Zipf’s law distribution. A comparison is made between the standard set of J y kanji and highfrequency kanj in the corpora, and the degrees of similarity among the corpora are obtained with the Chi(2) By Degrees of Freedom (CBDF) measure proposed by Kilgarriff (1997). Finally a simple method is tried and evaluated for identifying kanji that have a high frequency in a particular corpus compared to their crosscorpus frequency. Katsuo Tamaoka, Chizuko Matsuoka, Hiromu Sakai, Shogo Makioka Predicting Attachment of the Light Verb –suru to Japanese Two-kanji Compound Words Using Four Aspects 73-81 Abstract. In the Japanese language, the light verb –suru can be attached to various two-kanji compound words containing a verb-like feature (or aspects) to allow them to be used as a verb. Using a large sample of the 2,000 two-kanji compound words, encompassing a little less than 80 percent of the total two-kanji compound words printed in 14 years of Asahi Newspaper issues, the present study investigates how much the light verb attachment is predicted by four aspects: inchoative, durative, telic and stative. A binary logistic regression analysis indicates that all four aspects are significant predictors. Among them, the telic aspect shows an overwhelmingly high predictive power. The quantitative theory type III analysis further demonstrates that, in contrast to the stative aspect, the inchoative, durative and telic aspects share a similar semantic feature of time series. Nevertheless, since the telic aspect overlaps not only the time series feature of the inchoative and durative aspects, but also the stative aspect, it is the most effective single predictor for light verb attachment, showing an extremely high prediction percentage of 93.64 with 1.05 percent error. In the Japanese language, the light verb –suru can be attached to various two-kanji compound words containing a verb-like feature (or aspects) to allow them to be used as a verb. Using a large sample of the 2,000 two-kanji compound words, encompassing a little less than 80 percent of the total two-kanji compound words printed in 14 years of Asahi Newspaper issues, the present study investigates how much the light verb attachment is predicted by four aspects: inchoative, durative, telic and stative. A binary logistic regression analysis indicates that all four aspects are significant predictors. Among them, the telic aspect shows an overwhelmingly high predictive power. The quantitative theory type III analysis further demonstrates that, in contrast to the stative aspect, the inchoative, durative and telic aspects share a similar semantic feature of time series. Nevertheless, since the telic aspect overlaps not only the time series feature of the inchoative and durative aspects, but also the stative aspect, it is the most effective single predictor for light verb attachment, showing an extremely high prediction percentage of 93.64 with 1.05 percent error. Terry Joyce Constructing a Large-Scale Database of Japanese Word Associations 82-99 Abstract. For cognitive scientists investigating the nature of lexical knowledge, one essential task is to map out the rich networks of associations that exist between words. This paper reports on a project to construct a large-scale database of word association norms for basic Japanese vocabulary and, utilizing the database, to develop lexical association network maps that tap into important aspects of words and their connectivity. The Japanese word association database will complement existing databases concerning the lexical features of Japanese vocabulary, such as familiarity ratings and frequency counts (Amano & Kondo, 1999; Yokoyama, Sasahara, Nozaki & Long, 1998), and the kanji corpus research highlighted in this special issue. Part 2 of this paper outlines the construction of the database, by detailing initial collections of word association responses from two major questionnaire surveys and the current state of the database. Part 3 introduces the lexical association network maps that will be developed based on the word association norm data and discuses some particularly promising applications of the database and the network maps in the areas of cognitive science and Japanese lexicography and language instruction. For cognitive scientists investigating the nature of lexical knowledge, one essential task is to map out the rich networks of associations that exist between words. This paper reports on a project to construct a large-scale database of word association norms for basic Japanese vocabulary and, utilizing the database, to develop lexical association network maps that tap into important aspects of words and their connectivity. The Japanese word association database will complement existing databases concerning the lexical features of Japanese vocabulary, such as familiarity ratings and frequency counts (Amano & Kondo, 1999; Yokoyama, Sasahara, Nozaki & Long, 1998), and the kanji corpus research highlighted in this special issue. Part 2 of this paper outlines the construction of the database, by detailing initial collections of word association responses from two major questionnaire surveys and the current state of the database. Part 3 introduces the lexical association network maps that will be developed based on the word association norm data and discuses some particularly promising applications of the database and the network maps in the areas of cognitive science and Japanese lexicography and language instruction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Häufigkeiten von Buchstaben / Graphemen / Phonemen: Konvergenzen des Rangierungsverhaltens

The present study raises the question in how far low-level linguistic units, such as letters, graphemes, sounds and phonemes, follow one and the same pattern as to their frequency distribution. Based on Altmann/Lehfeldt’s (1980) study on 63 samples from 38 different languages, a separate reanalysis of the letter/grapheme vs. sound/phoneme samples is made, concentrating on the empirical entropy ...

متن کامل

Pacon 2005

PACON 2005 Harmonization of Port and Industry ABSTRACTS TABLE OF CONTENTSS TABLE OF CONTENTS

متن کامل

Can simple models explain Zipf's law for all exponents?

H. Simon proposed a simple stochastic process for explaining Zipf’s law for word frequencies. Here we introduce two similar generalizations of Simon’s model that cover the same range of exponents as the standard Simon model. The mathematical approach followed minimizes the amount of mathematical background needed for deriving the exponent, compared to previous approaches to the standard Simon’s...

متن کامل

Some properties of the Ukrainian writing system

We investigate the grapheme–phoneme relation in Ukrainian and some properties of the Ukrainian version of the Cyrillic alphabet.

متن کامل

On stratification in poetry

Texts are composed of many different strata on different levels. A method is proposed to find the number of strata at the word-form level in Slovak poetry and to study the relationship between the parameters of the fitting function.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012